Kaggle Learn: Computer Vision¶

In this notebook, we teach a computer to see! We will be introduced to the fundamental ideas of computer vision. Our goal is to learn how a neural network can "understand" a natural image well-enough to solve the same kinds of problems the human visual system can solve.

The neural networks that are best at this task are called convolutional neural networks (Sometimes we say convnet or CNN instead.) Convolution is the mathematical operation that gives the layers of a convnet their unique structure.

We will apply these ideas to the problem of image classification: given a picture, can we train a computer to tell us what it's a picture of? For example, apps that can identify a species of plant from a photograph are image classifiers!

Table Of Contents¶

  • 0. Dependancies and Settings
  • 1. Useful Links
  • 2. The Convolutional Classifier
    • 2.1. Training a Covnet Classifier
  • 3. Feature Extraction
    • 3.1. Filter with convolution
    • 3.2. Detect with ReLU
    • 3.3. Manual example
  • 4. Pooling
    • 4.1. Maximum pooling
    • 4.2. Translation invariance
    • 4.3. Global Average Pooling
  • 5. The Sliding Window
    • 5.1. Stride
    • 5.2. Padding
    • 5.3. Exploring Sliding Windows
    • 5.4. The Receptive Field
    • 5.5. One-Dimensional Convolution
  • 6. Custom Convnets
    • 6.1. Convolutional Blocks
    • 6.2.
  • 7. Data Augmentation

0 Dependancies and Settings¶

Installation:

In [1]:
# Installing or upgrading
# Note: might have to restart kernel

# Uncomment:
# import sys

# Installing:
# !{sys.executable} -m pip install scikit-learn
# Upgrading:
# !{sys.executable} -m pip install --upgrade scipy==1.9.0 --user

Imports:

In [2]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import tensorflow as tf
import sklearn
In [3]:
print('pandas version:', pd.__version__)
print('numpy version:', np.__version__)
print('tensorflow version:', tf.__version__)
print('sci-kit learn version:', sklearn.__version__)
pandas version: 1.5.2
numpy version: 1.23.0
tensorflow version: 2.11.0
sci-kit learn version: 1.2.0

Set the plotting style:

In [4]:
try:
    scientific_style = [
        '../../Random/PythonTutorialsForDataScience/data/science.mplstyle', 
        '../../Random/PythonTutorialsForDataScience/data/notebook.mplstyle', 
        '../../Random/PythonTutorialsForDataScience/data/grid.mplstyle'
    ]

    plt.style.use(scientific_style)
    
    print('Using Scientific Style.')
except:
    print('Missing Scientific Style, continuing with default.')
Using Scientific Style.

Define the filepath where most of the data resides:

In [5]:
path = r'C:\Users\seani\Documents\JupyterNotebooks\Kaggle\KaggleLearn\Assets'

Function used to get names of files in a directory:

In [6]:
import os

def get_files(path):
    '''
    Inputs: a path string
    Returns: a list of names of files in a directory
    '''
    
    files = []
    # search through each item in the directory
    for file in os.listdir(path):
        # check it is a file
        if os.path.isfile(os.path.join(path, file)):
            files.append(file)
    
    return files

The seed used throughout for reproducable randomness:

In [7]:
seed = 1

1 Useful Links¶

  • Tensorflow main page: https://www.tensorflow.org/
  • Tensorflow python API: https://www.tensorflow.org/api_docs/python/tf

2 The Convolutional Classifier¶

A convnet used for image classification consists of two parts: a convolutional base and a dense head.

The base is used to extract the features from an image. It is formed primarily of layers performing the convolution operation, but often includes other kinds of layers as well. (You'll learn about these in the next lesson.)

The head is used to determine the class of the image. It is formed primarily of dense layers, but might include other layers like dropout.

Figure1.jpg

Figure 1: The Image input is passed through the base and head where features are extracted and classified respectively. These classifications are then used to reach a prediction (output).

What do we mean by visual feature? A feature could be a line, a color, a texture, a shape, a pattern - or some complicated combination.

The whole process goes something like this:

Note: The features actually extracted look a bit different, but it gives the idea.

Figure2.jpg

Figure 2: After the image is passed through the base, it is split into its respective features. After passing through the head, these are classified and a prediction is made.

The goal of the network during training is to learn two things:

  1. Which features to extract from an image (base),
  2. Which class goes with what features (head).

These days, convnets are rarely trained from scratch. More often, we reuse the base of a pretrained model. To the pretrained base we then attach an untrained head. In other words, we reuse the part of a network that has already learned to extract features (1.), and attach to it some fresh layers to learn classification (2.).

Reusing a pretrained model is a technique known as transfer learning. It is so effective, that almost every image classifier these days will make use of it.

2.1 Training a Covnet Classifier¶

Throughout this notebook, we're going to be creating classifiers that attempt to solve the following problem: is this a picture of a Car or of a Truck? Our dataset is about 10,000 pictures of various automobiles, around half cars and half trucks.

This following cell will import some libraries and set up our data pipeline. We have a training split called ds_train and a validation split called ds_valid.

In [8]:
# imports
import os, warnings
from matplotlib import gridspec
from tensorflow.keras.preprocessing import image_dataset_from_directory

warnings.filterwarnings("ignore") # to clean up output cells

# reproducability
np.random.seed(seed)
tf.random.set_seed(seed)

# load training and validation sets
ds_train_ = image_dataset_from_directory(
    'Assets/car-or-truck/train',
    labels='inferred',
    label_mode='binary',
    image_size=[128, 128],
    interpolation='nearest',
    batch_size=64,
    shuffle=True,
)
ds_valid_ = image_dataset_from_directory(
    'Assets/car-or-truck/valid',
    labels='inferred',
    label_mode='binary',
    image_size=[128, 128],
    interpolation='nearest',
    batch_size=64,
    shuffle=False,
)

# Data Pipeline
def convert_to_float(image, label):
    image = tf.image.convert_image_dtype(image, dtype=tf.float32)
    return image, label

AUTOTUNE = tf.data.experimental.AUTOTUNE
ds_train = (
    ds_train_
    .map(convert_to_float)
    .cache()
    .prefetch(buffer_size=AUTOTUNE)
)
ds_valid = (
    ds_valid_
    .map(convert_to_float)
    .cache()
    .prefetch(buffer_size=AUTOTUNE)
)

ds_train
Found 5117 files belonging to 2 classes.
Found 5051 files belonging to 2 classes.
Out[8]:
<PrefetchDataset element_spec=(TensorSpec(shape=(None, 128, 128, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None, 1), dtype=tf.float32, name=None))>

The most commonly used dataset for pretraining is ImageNet, a large dataset of many kind of natural images. Keras includes a variety models pretrained on ImageNet in its applications module.

One of the bases we can use is from a model called VGG16. However, we would see that the VGG16 architecture is prone to overfitting this dataset. Over this notebook, we'll learn a number of ways we can improve upon this initial attempt.

The first way we'll see is to use a base more appropriate to the dataset. The base this model comes from is called InceptionV1 (also known as GoogLeNet). InceptionV1 was one of the early winners of the ImageNet competition. One of its successors, InceptionV4, is among the state of the art today.

The InceptionV1 model pretrained on ImageNet is available in the TensorFlow Hub repository, but we'll load it from a local copy:

Note: when doing transfer learning, it's generally not a good idea to retrain the entire base - at least not without some care. The reason is that the random weights in the head will initially create large gradient updates, which propogate back into the base layers and destroy much of the pretraining. Using techniques known as fine tuning it's possible to further train the base on new data, but this requires some care to do well.

In [9]:
pretrained_base = tf.keras.models.load_model(
    'models/inceptionv1'
)

pretrained_base.trainable = False

pretrained_base
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), *NOT* tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.
Out[9]:
<keras.engine.sequential.Sequential at 0x2ae8aeb8250>

Next, we attach the classifier head. For this example, we'll use a layer of hidden units (the first Dense layer) followed by a layer to transform the outputs to a probability score for class 1, Truck. The Flatten layer transforms the two dimensional outputs of the base into the one dimensional inputs needed by the head.

In [10]:
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    # Base
    pretrained_base,
    layers.Flatten(),  # flatten outputs of base
    # Head
    layers.Dense(units=6, activation='relu'),
    # Output
    layers.Dense(units=1, activation='sigmoid'),
])

Finally, let's train the model. Since this is a two-class problem, we'll use the binary versions of crossentropy and accuracy. The adam optimizer generally performs well, so we'll choose it as well:

In [11]:
# create optimizer
optimizer = tf.keras.optimizers.Adam(epsilon=0.01)

# compile model
model.compile(
    optimizer=optimizer,
    loss = 'binary_crossentropy',
    metrics=['binary_accuracy'],
)

history = model.fit(
    ds_train,
    validation_data=ds_valid,
    epochs=30,
)
Epoch 1/30
80/80 [==============================] - 99s 1s/step - loss: 0.5070 - binary_accuracy: 0.7559 - val_loss: 0.4139 - val_binary_accuracy: 0.8165
Epoch 2/30
80/80 [==============================] - 77s 971ms/step - loss: 0.4122 - binary_accuracy: 0.8153 - val_loss: 0.3834 - val_binary_accuracy: 0.8333
Epoch 3/30
80/80 [==============================] - 80s 1s/step - loss: 0.3873 - binary_accuracy: 0.8313 - val_loss: 0.3692 - val_binary_accuracy: 0.8418
Epoch 4/30
80/80 [==============================] - 80s 1s/step - loss: 0.3708 - binary_accuracy: 0.8384 - val_loss: 0.3599 - val_binary_accuracy: 0.8454
Epoch 5/30
80/80 [==============================] - 86s 1s/step - loss: 0.3579 - binary_accuracy: 0.8435 - val_loss: 0.3531 - val_binary_accuracy: 0.8509
Epoch 6/30
80/80 [==============================] - 80s 1s/step - loss: 0.3473 - binary_accuracy: 0.8501 - val_loss: 0.3478 - val_binary_accuracy: 0.8543
Epoch 7/30
80/80 [==============================] - 79s 995ms/step - loss: 0.3383 - binary_accuracy: 0.8538 - val_loss: 0.3437 - val_binary_accuracy: 0.8549
Epoch 8/30
80/80 [==============================] - 85s 1s/step - loss: 0.3305 - binary_accuracy: 0.8556 - val_loss: 0.3403 - val_binary_accuracy: 0.8567
Epoch 9/30
80/80 [==============================] - 88s 1s/step - loss: 0.3236 - binary_accuracy: 0.8595 - val_loss: 0.3376 - val_binary_accuracy: 0.8577
Epoch 10/30
80/80 [==============================] - 86s 1s/step - loss: 0.3175 - binary_accuracy: 0.8638 - val_loss: 0.3353 - val_binary_accuracy: 0.8588
Epoch 11/30
80/80 [==============================] - 80s 1000ms/step - loss: 0.3120 - binary_accuracy: 0.8677 - val_loss: 0.3335 - val_binary_accuracy: 0.8588
Epoch 12/30
80/80 [==============================] - 79s 998ms/step - loss: 0.3069 - binary_accuracy: 0.8693 - val_loss: 0.3319 - val_binary_accuracy: 0.8586
Epoch 13/30
80/80 [==============================] - 78s 982ms/step - loss: 0.3023 - binary_accuracy: 0.8714 - val_loss: 0.3307 - val_binary_accuracy: 0.8590
Epoch 14/30
80/80 [==============================] - 84s 1s/step - loss: 0.2979 - binary_accuracy: 0.8738 - val_loss: 0.3297 - val_binary_accuracy: 0.8600
Epoch 15/30
80/80 [==============================] - 78s 980ms/step - loss: 0.2934 - binary_accuracy: 0.8763 - val_loss: 0.3291 - val_binary_accuracy: 0.8600
Epoch 16/30
80/80 [==============================] - 82s 1s/step - loss: 0.2894 - binary_accuracy: 0.8788 - val_loss: 0.3282 - val_binary_accuracy: 0.8626
Epoch 17/30
80/80 [==============================] - 88s 1s/step - loss: 0.2857 - binary_accuracy: 0.8794 - val_loss: 0.3276 - val_binary_accuracy: 0.8632
Epoch 18/30
80/80 [==============================] - 87s 1s/step - loss: 0.2819 - binary_accuracy: 0.8812 - val_loss: 0.3269 - val_binary_accuracy: 0.8646
Epoch 19/30
80/80 [==============================] - 87s 1s/step - loss: 0.2787 - binary_accuracy: 0.8822 - val_loss: 0.3269 - val_binary_accuracy: 0.8646
Epoch 20/30
80/80 [==============================] - 87s 1s/step - loss: 0.2755 - binary_accuracy: 0.8849 - val_loss: 0.3271 - val_binary_accuracy: 0.8642
Epoch 21/30
80/80 [==============================] - 80s 1s/step - loss: 0.2725 - binary_accuracy: 0.8868 - val_loss: 0.3272 - val_binary_accuracy: 0.8650
Epoch 22/30
80/80 [==============================] - 80s 1s/step - loss: 0.2690 - binary_accuracy: 0.8878 - val_loss: 0.3275 - val_binary_accuracy: 0.8668
Epoch 23/30
80/80 [==============================] - 81s 1s/step - loss: 0.2659 - binary_accuracy: 0.8902 - val_loss: 0.3279 - val_binary_accuracy: 0.8652
Epoch 24/30
80/80 [==============================] - 82s 1s/step - loss: 0.2629 - binary_accuracy: 0.8921 - val_loss: 0.3282 - val_binary_accuracy: 0.8650
Epoch 25/30
80/80 [==============================] - 82s 1s/step - loss: 0.2598 - binary_accuracy: 0.8941 - val_loss: 0.3288 - val_binary_accuracy: 0.8650
Epoch 26/30
80/80 [==============================] - 82s 1s/step - loss: 0.2574 - binary_accuracy: 0.8958 - val_loss: 0.3290 - val_binary_accuracy: 0.8650
Epoch 27/30
80/80 [==============================] - 84s 1s/step - loss: 0.2549 - binary_accuracy: 0.8964 - val_loss: 0.3303 - val_binary_accuracy: 0.8654
Epoch 28/30
80/80 [==============================] - 82s 1s/step - loss: 0.2520 - binary_accuracy: 0.8972 - val_loss: 0.3299 - val_binary_accuracy: 0.8660
Epoch 29/30
80/80 [==============================] - 81s 1s/step - loss: 0.2493 - binary_accuracy: 0.8996 - val_loss: 0.3311 - val_binary_accuracy: 0.8668
Epoch 30/30
80/80 [==============================] - 81s 1s/step - loss: 0.2469 - binary_accuracy: 0.8994 - val_loss: 0.3310 - val_binary_accuracy: 0.8664

We can use Pandas to convert this dictionary to a dataframe and plot it with a built-in method:

In [12]:
import pandas as pd
history_frame = pd.DataFrame(history.history)
history_frame.loc[:, ['loss', 'val_loss']].plot()
history_frame.loc[:, ['binary_accuracy', 'val_binary_accuracy']].plot();

The training loss and validation loss stay fairly close, this is evidence that the model isn't just memorizing the training data but rather learning general properties of the two classes. However, because this model converges at a loss greater than the VGG16 model (test for yourself), it's likely that it is underfitting some, and could benefit from some extra capacity.

3 Feature Extraction¶

In this section, we're going to learn about one of the two most important types of layers that you'll usually find in the base of a convolutional image classifier. This is the convolutional layer with ReLU activation (the other layer is the maximum pooling layer, discussed later).

Before we get into the details of convolution, let's discuss the purpose of these layers in the network. We're going to see how these three operations (convolution, ReLU, and maximum pooling) are used to implement the feature extraction process:

  1. Filter an image for a particular feature (convolution)
  2. Detect that feature within the filtered image (ReLU)
  3. Condense the image to enhance the features (maximum pooling)

The next figure illustrates this process. You can see how these three operations are able to isolate some particular characteristic of the original image (in this case, horizontal lines):

Figure3.jpg

Figure 3: The three steps of feature extraction acting upon the input image. Notice how convolution filters the image for the features, ReLU allows detection of desired features, and maximum pooling enhances these.

Typically, the network will perform several extractions in parallel on a single image. In modern convnets, it's not uncommon for the final layer in the base to be producing over 1000 unique visual features.

3.1 Filter with convolution¶

A convolutional layer carries out the filtering step. You might define a convolutional layer in a Keras model something like the following cell. We can understand these parameters by looking at their relationship to the weights and activations of the layer:

In [13]:
from tensorflow import keras
from tensorflow.keras import layers

model = keras.Sequential([
    layers.Conv2D(filters=64, kernel_size=3), # activation is None
    # More layers follow
])

The weights a convnet learns during training are primarily contained in its convolutional layers. These weights we call kernels. We can represent them as small arrays. The following are examples of kernels (with their effects):

Figure4.jpg

Figure 4: Examples of kernels with the effects they have on the image.

A kernel operates by scanning over an image and producing a weighted sum of pixel values. In this way, a kernel will act sort of like a polarized lens, emphasizing or deemphasizing certain patterns of information.

Figure5.jpg

Figure 5: The kernel is acted upon each pixel one after another. Each cell in the kernel has a value and hovers over a pixel, which also has a value (think RGB or grayscale), these values are multiplied and added with the other mulitplied values from other pixels and cells. The result is then the value of the pixel in the middle.

Kernels define how a convolutional layer is connected to the layer that follows. The kernel above will connect each neuron in the output to nine neurons in the input. By setting the dimensions of the kernels with kernel_size, you are telling the convnet how to form these connections. Most often, a kernel will have odd-numbered dimensions -- like kernel_size=(3, 3) or (5, 5) -- so that a single pixel sits at the center, but this is not a requirement.

The kernels in a convolutional layer determine what kinds of features it creates. During training, a convnet tries to learn what features it needs to solve the classification problem. This means finding the best values for its kernels.

See the But what is a convolution? video by 3blue1brown for further explanation: https://www.youtube.com/watch?v=KuXjwB4LzSA&t=891s&ab_channel=3Blue1Brown

Note: typically in maths you flip the kernal around before applying it, and computer science not so. This varies from application to application.

Feature maps what result when we apply a filter to an image; they contain the visual features the kernel extracts. Here are a few kernels pictured with feature maps they produced:

Figure6.jpg

Figure 6: The input image with different kernels acted upon it.

From the pattern of numbers in the kernel, you can tell the kinds of feature maps it creates. Generally, what a convolution accentuates in its inputs will match the shape of the positive numbers in the kernel (again this may differ from the maths way of doing it). The left and middle kernels above will both filter for horizontal shapes.

With the filters parameter, you tell the convolutional layer how many feature maps you want it to create as output.

3.2 Detect with ReLU¶

Recall the ReLU activation function:

In [14]:
x = np.linspace(-3, 3, 100)
y = np.maximum(0, x)
plt.plot(x, y)
plt.xlabel('input')
plt.ylabel('output')
plt.title('ReLU activation function')
Out[14]:
Text(0.5, 1.0, 'ReLU activation function')

The ReLU activation can be defined in its own Activation layer, but most often you'll just include it as the activation function of Conv2D:

In [15]:
model = keras.Sequential([
    layers.Conv2D(filters=64, kernel_size=3, activation='relu')
    # More layers follow
])

You could think about the activation function as scoring pixel values according to some measure of importance. The ReLU activation says that negative values are not important and so sets them to 0. ("Everything unimportant is equally unimportant.")

Here is ReLU applied the feature maps above. Notice how it succeeds at isolating the features:

Figure7.jpg

Figure 7: Same as Figure 6 but now with ReLU applied. Notice how the desired features are now detected and isolated.

3.3 Manual example¶

Let's now perform these on an example image. The image has been taken from the Car or Truck? dataset:

In [16]:
# get path to image
image_path = 'Assets/car-or-truck/train/Car/00268.jpeg'
# read it and preprocess
image = tf.io.read_file(image_path)
image = tf.io.decode_jpeg(image, channels=1)
image = tf.image.resize(image, size=[400, 400])  # resize for readability

# plot with matplotlib
img = tf.squeeze(image).numpy()
plt.figure(figsize=(6, 6))
plt.imshow(img, cmap='gray')
plt.axis('off')
Out[16]:
(-0.5, 399.5, 399.5, -0.5)

Now let's define the kernel to use. One thing to keep in mind is that the sum of the numbers in the kernel determines how bright the final image is. Generally, you should try to keep the sum of the numbers between 0 and 1 (though that's not required).

In general, a kernel can have any number of rows and columns. For this exercise, let's use a $3 \times 3$ kernel, which often gives the best results. We define a kernel with tf.constant, the following kernel should preserve vertical features on the right:

In [17]:
kernel = tf.constant([
    [-0.25, 0, 0.25],
    [-0.25, 0, 0.25],
    [-0.25, 0, 0.25],
])

Now we must first perform some preprocessing before applying these:

In [18]:
# Reformat for batch compatibility.
image = tf.image.convert_image_dtype(image, dtype=tf.float32)  # converts pixel values to floats
image = tf.expand_dims(image, axis=0)  # adds an extra dimension
kernel = tf.reshape(kernel, [*kernel.shape, 1, 1])  # reshapes kernel to necessary shape
kernel = tf.cast(kernel, dtype=tf.float32)  # converts kernel to floats

Instead of using the keras layer for the convolution, we will use the tensorflow backend function that performs the same operation, tf.nn.conv2d():

Note: the strides and padding arguments will be explained later.

In [19]:
image_filter = tf.nn.conv2d(
    input=image,
    filters=kernel,
    strides=1, # or (1, 1)
    padding='SAME',
)

plt.imshow(
    # Reformat for plotting
    tf.squeeze(image_filter)
)
plt.axis('off')
Out[19]:
(-0.5, 399.5, 399.5, -0.5)

Now let's apply the ReLU function. There is also a tensorflow backend function, tf.nn.relu():

In [20]:
image_detect = tf.nn.relu(image_filter)

plt.imshow(
    # Reformat for plotting
    tf.squeeze(image_detect)
)
plt.axis('off')
Out[20]:
(-0.5, 399.5, 399.5, -0.5)

Notice the vertical features on the right edge have been isolated.

It can also be useful to see how these operations act on an array, to really see how they affect the input:

In [21]:
image = np.array([
    [0, 1, 0, 0, 0, 0],
    [0, 1, 0, 0, 0, 0],
    [0, 1, 0, 0, 0, 0],
    [0, 1, 0, 0, 0, 0],
    [0, 1, 0, 1, 1, 1],
    [0, 1, 0, 0, 0, 0],
])

kernel = np.array([
    [1, -1],
    [1, -1],
])

print('Input array:')
display(image)
print('Kernel:')
display(kernel)

# Reformat for Tensorflow
image = tf.cast(image, dtype=tf.float32)
image = tf.reshape(image, [1, *image.shape, 1])
kernel = tf.reshape(kernel, [*kernel.shape, 1, 1])
kernel = tf.cast(kernel, dtype=tf.float32)

image_filter = tf.nn.conv2d(
    input=image,
    filters=kernel,
    strides=1,
    padding='VALID',
)
image_detect = tf.nn.relu(image_filter)

# The first matrix is the image after convolution, and the second is
# the image after ReLU.
print('After convolution:')
display(tf.squeeze(image_filter).numpy())
print('After ReLU:')
display(tf.squeeze(image_detect).numpy())
Input array:
array([[0, 1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0],
       [0, 1, 0, 1, 1, 1],
       [0, 1, 0, 0, 0, 0]])
Kernel:
array([[ 1, -1],
       [ 1, -1]])
After convolution:
array([[-2.,  2.,  0.,  0.,  0.],
       [-2.,  2.,  0.,  0.,  0.],
       [-2.,  2.,  0.,  0.,  0.],
       [-2.,  2., -1.,  0.,  0.],
       [-2.,  2., -1.,  0.,  0.]], dtype=float32)
After ReLU:
array([[0., 2., 0., 0., 0.],
       [0., 2., 0., 0., 0.],
       [0., 2., 0., 0., 0.],
       [0., 2., 0., 0., 0.],
       [0., 2., 0., 0., 0.]], dtype=float32)

Notice how only the vertical line was preserved.

Below are some kernel functions that can be used:

In [22]:
# PREDEFINED KERNELS #

# Edge detection
edge = tf.constant(
    [[-1, -1, -1],
     [-1, 8, -1],
     [-1, -1, -1]],
)

# Blur
blur = tf.constant(
    [[0.0625, 0.125, 0.0625],
     [0.125, 0.25, 0.125],
     [0.0625, 0.125, 0.0625]],
)

# Bottom sobel
bottom_sobel = tf.constant(
    [[-1, -2, -1],
     [0, 0, 0],
     [1, 2, 1]],
)

# Emboss South-East
emboss = tf.constant(
    [[-2, -1, 0],
     [-1, 1, 1],
     [0, 1, 2]],
)

# Sharpen
sharpen = tf.constant(
    [[0, -1, 0],
     [-1, 5, -1],
     [0, -1, 0]],
)

4 Pooling¶

In this section, we'll look at the third (and final) operation in the sequence: condense with maximum pooling, which in Keras is done by a MaxPool2D layer. We will also look at something called global average pooling, which allows us to further condense reduce the number of parameters in the model, it is still widely used in the head of a convnet.

4.1 Maximum pooling¶

Adding condensing step to the model we had before, will give us this:

In [23]:
model = keras.Sequential([
    layers.Conv2D(filters=64, kernel_size=3), # activation is None
    layers.MaxPool2D(pool_size=2),
    # More layers follow
])

A MaxPool2D layer is much like a Conv2D layer, except that it uses a simple maximum function instead of a kernel, with the pool_size parameter analogous to kernel_size. A MaxPool2D layer doesn't have any trainable weights like a convolutional layer does in its kernel, however.

Let's take another look at the extraction figure from the last section, Figure $3$. Remember that MaxPool2D is the Condense step. Notice that after applying the ReLU function (Detect) the feature map ends up with a lot of "dead space," that is, large areas containing only 0's (the black areas in the image). Having to carry these 0 activations through the entire network would increase the size of the model without adding much useful information. Instead, we would like to condense the feature map to retain only the most useful part - the feature itself.

This in fact is what maximum pooling does. Max pooling takes a patch of activations in the original feature map and replaces them with the maximum activation in that patch.

Note, from ChatGPT: The feature maps represent the activations (outputs) of the previous layer, typically a convolutional layer. These activations are the result of applying a set of filters to the input image, and they represent the features of the image at different levels of abstraction.

Figure8.jpg

Figure 8: An example of maximum pooling acted on a feature map. Notice how the Original is split into larger cells, the maximum value in each cell is taken as the value for the whole cell. Although this loses some complexity, it greatly condenses the feature map and thus reduces the number of parameters we have to keep track of.

When applied after the ReLU activation, it has the effect of "intensifying" features. The pooling step increases the proportion of active pixels to zero pixels, and thus condenses the feature map.

Let's examine this with the picture above. The following code gets us back on track:

In [24]:
# get path to image
image_path = 'Assets/car-or-truck/train/Car/00268.jpeg'
# read it and preprocess
image = tf.io.read_file(image_path)
image = tf.io.decode_jpeg(image, channels=1)
image = tf.image.resize(image, size=[400, 400])  # resize for readability

# Define kernel
kernel = tf.constant([
    [-1, -1, -1],
    [-1,  8, -1],
    [-1, -1, -1],
], dtype=tf.float32)

# Reformat for batch compatibility.
image = tf.image.convert_image_dtype(image, dtype=tf.float32)
image = tf.expand_dims(image, axis=0)
kernel = tf.reshape(kernel, [*kernel.shape, 1, 1])

# Filter step
image_filter = tf.nn.conv2d(
    input=image,
    filters=kernel,
    # we'll talk about these two in the next section
    strides=1,
    padding='SAME'
)

# Detect step
image_detect = tf.nn.relu(image_filter)

We can use the tensorflow backend function for the MaxPool2D layer, the tf.nn.pool() function:

In [25]:
image_condense = tf.nn.pool(
    input=image_detect, # image in the Detect step above
    window_shape=(2, 2),
    pooling_type='MAX',
    # we'll see what these do in the next lesson!
    strides=(2, 2),
    padding='SAME',
)

Finally, we can show each operation to compare:

In [26]:
# Show what we have so far
plt.figure(figsize=(16, 6))
plt.subplot(141)
plt.imshow(tf.squeeze(image))
plt.axis('off')
plt.title('Input')
plt.subplot(142)
plt.imshow(tf.squeeze(image_filter))
plt.axis('off')
plt.title('Filter')
plt.subplot(143)
plt.imshow(tf.squeeze(image_detect))
plt.axis('off')
plt.title('Detect')
plt.subplot(144)
plt.imshow(tf.squeeze(image_condense))
plt.axis('off')
plt.title('Condense (MaxPool)')
Out[26]:
Text(0.5, 1.0, 'Condense (MaxPool)')

Notice how the Condense layer further intensified the features.

4.2 Translation invariance¶

We called the zero-pixels "unimportant". Does this mean they carry no information at all? In fact, the zero-pixels carry positional information. The blank space still positions the feature within the image. When MaxPool2D removes some of these pixels, it removes some of the positional information in the feature map. This gives a convnet a property called translation invariance. This means that a convnet with maximum pooling will tend not to distinguish features by their location in the image.

Watch what happens when we repeatedly apply maximum pooling to the following feature map:

Figure9.jpg

Figure 9: Maxpooling results in the two dots becoming indistinguishable, positional information has been lost.

The two dots in the original image became indistinguishable after repeated pooling. In other words, pooling destroyed some of their positional information. Since the network can no longer distinguish between them in the feature maps, it can't distinguish them in the original image either: it has become invariant to that difference in position.

However, pooling only creates translation invariance in a network over small distances, as with the two dots in the image. Features that begin far apart will remain distinct after pooling; only some of the positional information was lost, but not all of it:

Figure10.jpg

Figure 10: Although some positional information can be lost as seen in Figure 9, not all is. The two dots are far enough apart that after maxpooling they are still distinguishable.

This invariance to small differences in the positions of features is a nice property for an image classifier to have. Just because of differences in perspective or framing, the same kind of feature might be positioned in various parts of the original image, but we would still like for the classifier to recognize that they are the same. Because this invariance is built into the network, we can get away with using much less data for training: we no longer have to teach it to ignore that difference. This gives convolutional networks a big efficiency advantage over a network with only dense layers.

For example, consider a feature map in the shape of a circle. The idea is that the final image in the following figure ("MaxPool 4") would remain the same whether the circle was shifted further to the left, or further upwards, etc:

Figure11.jpg

Figure 11: Maxpooling causes the circle to be reduced to such a fixed shape that even if it was shifted around the image it would remain the same. So we remove positional information (disadvantage), however this allows us to condense the feature map (advantage).

4.3 Global Average Pooling¶

We mentioned in the previous exercise that average pooling has largely been superceeded by maximum pooling within the convolutional base. There is, however, a kind of average pooling that is still widely used in the head of a convnet. This is global average pooling. A GlobalAvgPool2D layer is often used as an alternative to some or all of the hidden Dense layers in the head of the network, like so:

In [27]:
model = keras.Sequential([
    pretrained_base,
    layers.GlobalAvgPool2D(),
    layers.Dense(1, activation='sigmoid'),
])

What is this layer doing? Notice that we no longer have the Flatten layer that usually comes after the base to transform the 2D feature data to 1D data needed by the classifier. Now the GlobalAvgPool2D layer is serving this function. But, instead of "unstacking" the feature (like Flatten), it simply replaces the entire feature map with its average value. Though very destructive, it often works quite well and has the advantage of reducing the number of parameters in the model.

Let's look at what GlobalAvgPool2D does on some randomly generated feature maps. This will help us to understand how it can "flatten" the stack of feature maps produced by the base:

Figure12.jpg

Figure 12: The feature maps (top) are pooled (bottom). Each cell of the pooled feature maps is the average of all the values of the corresponding feature map.

Since each of the $5 \times 5$ feature maps was reduced to a single value, global pooling reduced the number of parameters needed to represent these features by a factor of $25$ - a substantial savings!

Now we'll move on to understanding the pooled features. After we've pooled the features into just a single value, does the head still have enough information to determine a class? Let's pass some images from our Car or Truck dataset through VGG16 and examine the features that result after pooling. First we define the model and load the dataset:

In [28]:
# get the pretrained base
pretrained_base = tf.keras.models.load_model(
    'models/vgg16'
)

pretrained_base.trainable = False

# define the model
model = keras.Sequential([
    pretrained_base,
    # Attach a global average pooling layer after the base
    layers.GlobalAvgPool2D(),
])

# load training and validation sets
ds = image_dataset_from_directory(
    'Assets/car-or-truck/train',
    labels='inferred',
    label_mode='binary',
    image_size=[128, 128],
    interpolation='nearest',
    batch_size=1,
    shuffle=True,
)

ds_iter = iter(ds)  # allows us to iterate through the set
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), *NOT* tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.
Found 5117 files belonging to 2 classes.

Notice how we've attached a GlobalAvgPool2D layer after the pretrained VGG16 base. Ordinarily, VGG16 will produce $512$ feature maps for each image. The GlobalAvgPool2D layer reduces each of these to a single value, an "average pixel", if you like.

This next cell will run an image from the Car or Truck dataset through VGG16 and show you the $512$ average pixels created by GlobalAvgPool2D. Run the cell a few times and observe the pixels produced by cars versus the pixels produced by trucks.

In [29]:
car = next(ds_iter)

car_tf = tf.image.resize(car[0], size=[128, 128])
car_features = model(car_tf)
car_features = tf.reshape(car_features, shape=(16, 32))
label = int(tf.squeeze(car[1]).numpy())

plt.figure(figsize=(8, 4))
plt.subplot(121)
plt.imshow(tf.squeeze(car[0]))
plt.axis('off')
plt.title(["Car", "Truck"][label])
plt.subplot(122)
plt.imshow(car_features)
plt.title('Pooled Feature Maps')
plt.axis('off')
Out[29]:
(-0.5, 31.5, 15.5, -0.5)

The VGG16 base produces $512$ feature maps. We can think of each feature map as representing some high-level visual feature in the original image - maybe a wheel or window. Pooling a map gives us a single number, which we could think of as a score for that feature: large if the feature is present, small if it is absent. Cars tend to score high with one set of features, and Trucks score high with another. Now, instead of trying to map raw features to classes, the head only has to work with these scores that GlobalAvgPool2D produced, a much easier problem for it to solve.

5 The Sliding Window¶

In the previous two sections, we learned about the three operations that carry out feature extraction from an image:

  1. filter with a convolution layer
  2. detect with ReLU activation
  3. condense with a maximum pooling layer

The convolution and pooling operations share a common feature: they are both performed over a sliding window. With convolution, this "window" is given by the dimensions of the kernel, the parameter kernel_size. With pooling, it is the pooling window, given by pool_size.

Figure 13: The Sliding Window involves travelling along each pixel (cell) and performing the operations.

There are two additional parameters affecting both convolution and pooling layers - these are the strides of the window and whether to use padding at the image edges. The strides parameter says how far the window should move at each step, and the padding parameter describes how we handle the pixels at the edges of the input.

With these two parameters, defining the two layers becomes:

In [30]:
model = keras.Sequential([
    # convolution and relu activation
    layers.Conv2D(filters=64,
                  kernel_size=3,
                  strides=1,
                  padding='same',
                  activation='relu'),
    # max pooling
    layers.MaxPool2D(pool_size=2,
                     strides=1,
                     padding='same')
    # More layers follow
])

5.1 Stride¶

The distance the window moves at each step is called the stride. We need to specify the stride in both dimensions of the image: one for moving left to right and one for moving top to bottom. This animation shows strides=(2, 2), a movement of 2 pixels each step:

Figure 14: A stride of (2, 2). This causes the sliding window to skip every second cell.

What effect does the stride have? Whenever the stride in either direction is greater than $1$, the sliding window will skip over some of the pixels in the input at each step.

Because we want high-quality features to use for classification, convolutional layers will most often have strides=(1, 1). Increasing the stride means that we miss out on potentially valuble information in our summary. Maximum pooling layers, however, will almost always have stride values greater than $1$, like (2, 2) or (3, 3), but not larger than the window itself.

Finally, note that when the value of the strides is the same number in both directions, you only need to set that number; for instance, instead of strides=(2, 2), you could use strides=2 for the parameter setting.

5.2 Padding¶

When performing the sliding window computation, there is a question as to what to do at the boundaries of the input. Staying entirely inside the input image means the window will never sit squarely over these boundary pixels like it does for every other pixel in the input. Since we aren't treating all the pixels exactly the same, could there be a problem?

What the convolution does with these boundary values is determined by its padding parameter. In TensorFlow, you have two choices: either padding='same' or padding='valid'. There are trade-offs with each.

When we set padding='valid', the convolution window will stay entirely inside the input. The drawback is that the output shrinks (loses pixels), and shrinks more for larger kernels. This will limit the number of layers the network can contain, especially when inputs are small in size.

The alternative is to use padding='same'. The trick here is to pad the input with $0$'s around its borders, using just enough $0$'s to make the size of the output the same as the size of the input. This can have the effect however of diluting the influence of pixels at the borders. The animation below shows a sliding window with 'same' padding:

Figure 15: A sliding window with 'same' padding. Everything outside the actual image is padded with $0$'s.

The VGG model we've been looking at uses same padding for all of its convolutional layers. Most modern convnets will use some combination of the two. (Another parameter to tune!)

5.3 Exploring Sliding Windows¶

To better understand the effect of the sliding window parameters, it can help to observe a feature extraction on a low-resolution image so that we can see the individual pixels. Let's just look at a simple circle. The VGG architecture is fairly simple. It uses convolution with strides of $1$ and maximum pooling with $2 \times 2$ windows and strides of $2$:

Figure16.jpg

Figure 16: Operations performed on a circle, with small windows and strides.

And that works pretty well! The kernel was designed to detect horizontal lines, and we can see that in the resulting feature map the more horizontal parts of the input end up with the greatest activation.

What would happen if we changed the strides of the convolution to $3$?

Figure17.jpg

Figure 17: Operations performed on a circle, with small windows and larger strides.

This seems to reduce the quality of the feature extracted. Our input circle is rather "finely detailed," being only 1 pixel wide. A convolution with strides of $3$ is too coarse to produce a good feature map from it.

Sometimes, a model will use a convolution with a larger stride in it's initial layer. This will usually be coupled with a larger kernel as well. The ResNet50 model, for instance, uses $7 \times 7$ kernels with strides of $2$ in its first layer. This seems to accelerate the production of large-scale features without the sacrifice of too much information from the input.

5.4 The Receptive Field¶

Trace back all the connections from some neuron and eventually you reach the input image. All of the input pixels a neuron is connected to is that neuron's receptive field. The receptive field just tells you which parts of the input image a neuron receives information from.

As we've seen, if your first layer is a convolution with $3 \times 3$ kernels, then each neuron in that layer gets input from a $3 \times 3$ patch of pixels (except maybe at the border).

What happens if you add another convolutional layer with $3 \times 3$ kernels? Consider this next illustration:

Figure18.jpg

Figure 18: The receptive field gets larger with depth of convolution.

Now trace back the connections from the neuron at top and you can see that it's connected to a $5 \times 5$ patch of pixels in the input (the bottom layer): each neuron in the $3 \times 3$ patch in the middle layer is connected to a $3 \times 3$ input patch, but they overlap in a $5 \times 5$ patch. So that neuron at top has a $5 \times 5$ receptive field. A third convolutional layer would have a $7 \times 7$ receptive field. So why stack layers like this? Three (3, 3) kernels have $27$ parameters, while one (7, 7) kernel has $49$, though they both create the same receptive field. This stacking-layers trick is one of the ways convnets are able to create large receptive fields without increasing the number of parameters too much.

5.5 One-Dimensional Convolution¶

Convolutional networks turn out to be useful not only (two-dimensional) images, but also on things like time-series (one-dimensional) and video (three-dimensional).

We've seen how convolutional networks can learn to extract features from (two-dimensional) images. It turns out that convnets can also learn to extract features from things like time-series (one-dimensional) and video (three-dimensional).

Let's see what convolution looks like on a time-series.

The time series we'll use is from Google Trends. It measures the popularity of the search term "machine learning" for weeks from January 25, 2015 to January 15, 2020.

In [31]:
# Load the time series as a Pandas dataframe
machinelearning = pd.read_csv(
    'Assets/machinelearning.csv',
    parse_dates=['Week'],
    index_col='Week',
)

machinelearning.plot();

What about the kernels? Images are two-dimensional and so our kernels were 2D arrays. A time-series is one-dimensional, so the kernel also be a 1D array. Here are some kernels sometimes used on time-series data:

In [32]:
detrend = tf.constant([-1, 1], dtype=tf.float32)

average = tf.constant([0.2, 0.2, 0.2, 0.2, 0.2], dtype=tf.float32)

spencer = tf.constant([-3, -6, -5, 3, 21, 46, 67, 74, 67, 46, 32, 3, -5, -6, -3], dtype=tf.float32) / 320

Convolution on a sequence works just like convolution on an image. The difference is just that a sliding window on a sequence only has one direction to travel - left to right - instead of the two directions on an image. And just like before, the features picked out depend on the pattern on numbers in the kernel.

Uncomment one of the kernels below to see what kind of features they extract:

In [33]:
# UNCOMMENT ONE
kernel = detrend
# kernel = average
# kernel = spencer

# Reformat for TensorFlow
ts_data = machinelearning.to_numpy()
ts_data = tf.expand_dims(ts_data, axis=0)
ts_data = tf.cast(ts_data, dtype=tf.float32)
kern = tf.reshape(kernel, shape=(*kernel.shape, 1, 1))

ts_filter = tf.nn.conv1d(
    input=ts_data,
    filters=kern,
    stride=1,
    padding='VALID',
)

# Format as Pandas Series
machinelearning_filtered = pd.Series(tf.squeeze(ts_filter).numpy())

machinelearning_filtered.plot();

In fact, the detrend kernel filters for changes in the series, while average and spencer are both "smoothers" that filter for low-frequency components in the series.

If you were interested in predicting the future popularity of search terms, you might train a convnet on time-series like this one. It would try to learn what features in those series are most informative for the prediction.

Though convnets are not often the best choice on their own for these kinds of problems, they are often incorporated into other models for their feature extraction capabilities.

6 Custom Convnets¶

In the last three lessons, we saw how convolutional networks perform feature extraction through three operations: filter, detect, and condense. A single round of feature extraction can only extract relatively simple features from an image, things like simple lines or contrasts. These are too simple to solve most classification problems. Instead, convnets will repeat this extraction over and over, so that the features become more complex and refined as they travel deeper into the network. These are done using Convolutional Blocks.

6.1 Convolutional Blocks¶

It does this by passing them through long chains of convolutional blocks which perform this extraction. These convolutional blocks are stacks of Conv2D and MaxPool2D layers, whose role in feature extraction we learned about in the last few sections.

Each block represents a round of extraction, and by composing these blocks the convnet can combine and recombine the features produced, growing them and shaping them to better fit the problem at hand. The deep structure of modern convnets is what allows this sophisticated feature engineering and has been largely responsible for their superior performance.

Let's see how to define a deep convolutional network capable of engineering complex features. In this example, we'll create a Keras Sequence model and then train it on our Trucks or Cars dataset. Here is a diagram of the model we'll use:

Figure19.jpg

Figure 19: Diagram of the model being built. The base includes the first, second and third blocks, while the head is the final block.

Now we'll define the model. See how our model consists of three blocks of Conv2D and MaxPool2D layers (the base) followed by a head of Dense layers. We can translate this diagram more or less directly into a Keras Sequential model just by filling in the appropriate parameters:

In [34]:
model = keras.Sequential([
    # Block One
    layers.Conv2D(filters=32, kernel_size=3, activation='relu', padding='same',
                  input_shape=[128, 128, 3]),
    layers.MaxPool2D(),

    # Block Two
    layers.Conv2D(filters=64, kernel_size=3, activation='relu', padding='same'),
    layers.MaxPool2D(),

    # Block Three
    layers.Conv2D(filters=128, kernel_size=3, activation='relu', padding='same'),
    layers.Conv2D(filters=128, kernel_size=3, activation='relu', padding='same'),
    layers.MaxPool2D(),

    # Head
    layers.Flatten(),
    layers.Dense(6, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(1, activation='sigmoid'),
])

Notice in this definition is how the number of filters doubled block-by-block: $32$, $64$, $128$. This is a common pattern. Since the MaxPool2D layer is reducing the size of the feature maps, we can afford to increase the quantity we create.

We can compile this model just like before:

In [35]:
model.compile(
    optimizer=tf.keras.optimizers.Adam(epsilon=0.01),
    loss='binary_crossentropy',
    metrics=['binary_accuracy'],
)

And now training:

In [36]:
history = model.fit(
    ds_train,
    validation_data=ds_valid,
    epochs=50,
)
Epoch 1/50
80/80 [==============================] - 132s 2s/step - loss: 0.6787 - binary_accuracy: 0.5771 - val_loss: 0.6684 - val_binary_accuracy: 0.5785
Epoch 2/50
80/80 [==============================] - 128s 2s/step - loss: 0.6672 - binary_accuracy: 0.5787 - val_loss: 0.6626 - val_binary_accuracy: 0.5785
Epoch 3/50
80/80 [==============================] - 130s 2s/step - loss: 0.6617 - binary_accuracy: 0.5785 - val_loss: 0.6548 - val_binary_accuracy: 0.5785
Epoch 4/50
80/80 [==============================] - 131s 2s/step - loss: 0.6565 - binary_accuracy: 0.5804 - val_loss: 0.6515 - val_binary_accuracy: 0.6298
Epoch 5/50
80/80 [==============================] - 132s 2s/step - loss: 0.6458 - binary_accuracy: 0.6232 - val_loss: 0.6378 - val_binary_accuracy: 0.6316
Epoch 6/50
80/80 [==============================] - 132s 2s/step - loss: 0.6394 - binary_accuracy: 0.6269 - val_loss: 0.6240 - val_binary_accuracy: 0.6413
Epoch 7/50
80/80 [==============================] - 133s 2s/step - loss: 0.6255 - binary_accuracy: 0.6457 - val_loss: 0.6125 - val_binary_accuracy: 0.6622
Epoch 8/50
80/80 [==============================] - 136s 2s/step - loss: 0.6125 - binary_accuracy: 0.6603 - val_loss: 0.5965 - val_binary_accuracy: 0.6836
Epoch 9/50
80/80 [==============================] - 141s 2s/step - loss: 0.5980 - binary_accuracy: 0.6785 - val_loss: 0.5816 - val_binary_accuracy: 0.6929
Epoch 10/50
80/80 [==============================] - 131s 2s/step - loss: 0.5875 - binary_accuracy: 0.6811 - val_loss: 0.5694 - val_binary_accuracy: 0.7064
Epoch 11/50
80/80 [==============================] - 129s 2s/step - loss: 0.5575 - binary_accuracy: 0.7172 - val_loss: 0.5498 - val_binary_accuracy: 0.7181
Epoch 12/50
80/80 [==============================] - 131s 2s/step - loss: 0.5377 - binary_accuracy: 0.7293 - val_loss: 0.5283 - val_binary_accuracy: 0.7438
Epoch 13/50
80/80 [==============================] - 132s 2s/step - loss: 0.5106 - binary_accuracy: 0.7520 - val_loss: 0.5138 - val_binary_accuracy: 0.7561
Epoch 14/50
80/80 [==============================] - 132s 2s/step - loss: 0.4946 - binary_accuracy: 0.7549 - val_loss: 0.4895 - val_binary_accuracy: 0.7644
Epoch 15/50
80/80 [==============================] - 132s 2s/step - loss: 0.4606 - binary_accuracy: 0.7774 - val_loss: 0.4675 - val_binary_accuracy: 0.7804
Epoch 16/50
80/80 [==============================] - 132s 2s/step - loss: 0.4497 - binary_accuracy: 0.7868 - val_loss: 0.4469 - val_binary_accuracy: 0.7915
Epoch 17/50
80/80 [==============================] - 132s 2s/step - loss: 0.4099 - binary_accuracy: 0.8067 - val_loss: 0.4589 - val_binary_accuracy: 0.7769
Epoch 18/50
80/80 [==============================] - 132s 2s/step - loss: 0.3772 - binary_accuracy: 0.8251 - val_loss: 0.4216 - val_binary_accuracy: 0.8060
Epoch 19/50
80/80 [==============================] - 148s 2s/step - loss: 0.3670 - binary_accuracy: 0.8355 - val_loss: 0.4302 - val_binary_accuracy: 0.7959
Epoch 20/50
80/80 [==============================] - 144s 2s/step - loss: 0.3443 - binary_accuracy: 0.8423 - val_loss: 0.4024 - val_binary_accuracy: 0.8183
Epoch 21/50
80/80 [==============================] - 128s 2s/step - loss: 0.3041 - binary_accuracy: 0.8605 - val_loss: 0.3910 - val_binary_accuracy: 0.8206
Epoch 22/50
80/80 [==============================] - 128s 2s/step - loss: 0.2962 - binary_accuracy: 0.8659 - val_loss: 0.4194 - val_binary_accuracy: 0.8020
Epoch 23/50
80/80 [==============================] - 129s 2s/step - loss: 0.2822 - binary_accuracy: 0.8730 - val_loss: 0.4006 - val_binary_accuracy: 0.8238
Epoch 24/50
80/80 [==============================] - 131s 2s/step - loss: 0.2784 - binary_accuracy: 0.8714 - val_loss: 0.3950 - val_binary_accuracy: 0.8305
Epoch 25/50
80/80 [==============================] - 131s 2s/step - loss: 0.2356 - binary_accuracy: 0.9003 - val_loss: 0.4134 - val_binary_accuracy: 0.8264
Epoch 26/50
80/80 [==============================] - 131s 2s/step - loss: 0.2361 - binary_accuracy: 0.8947 - val_loss: 0.4096 - val_binary_accuracy: 0.8285
Epoch 27/50
80/80 [==============================] - 130s 2s/step - loss: 0.2350 - binary_accuracy: 0.8947 - val_loss: 0.4014 - val_binary_accuracy: 0.8335
Epoch 28/50
80/80 [==============================] - 128s 2s/step - loss: 0.2197 - binary_accuracy: 0.9064 - val_loss: 0.4308 - val_binary_accuracy: 0.8297
Epoch 29/50
80/80 [==============================] - 128s 2s/step - loss: 0.2080 - binary_accuracy: 0.9080 - val_loss: 0.4925 - val_binary_accuracy: 0.8200
Epoch 30/50
80/80 [==============================] - 129s 2s/step - loss: 0.1962 - binary_accuracy: 0.9162 - val_loss: 0.4393 - val_binary_accuracy: 0.8398
Epoch 31/50
80/80 [==============================] - 131s 2s/step - loss: 0.1661 - binary_accuracy: 0.9304 - val_loss: 0.4564 - val_binary_accuracy: 0.8454
Epoch 32/50
80/80 [==============================] - 131s 2s/step - loss: 0.1290 - binary_accuracy: 0.9412 - val_loss: 0.4831 - val_binary_accuracy: 0.8424
Epoch 33/50
80/80 [==============================] - 131s 2s/step - loss: 0.1331 - binary_accuracy: 0.9427 - val_loss: 0.4734 - val_binary_accuracy: 0.8329
Epoch 34/50
80/80 [==============================] - 130s 2s/step - loss: 0.1385 - binary_accuracy: 0.9455 - val_loss: 0.4514 - val_binary_accuracy: 0.8442
Epoch 35/50
80/80 [==============================] - 128s 2s/step - loss: 0.1325 - binary_accuracy: 0.9437 - val_loss: 0.5582 - val_binary_accuracy: 0.8303
Epoch 36/50
80/80 [==============================] - 130s 2s/step - loss: 0.1143 - binary_accuracy: 0.9533 - val_loss: 0.5562 - val_binary_accuracy: 0.8478
Epoch 37/50
80/80 [==============================] - 130s 2s/step - loss: 0.1069 - binary_accuracy: 0.9552 - val_loss: 0.5415 - val_binary_accuracy: 0.8406
Epoch 38/50
80/80 [==============================] - 132s 2s/step - loss: 0.1036 - binary_accuracy: 0.9564 - val_loss: 0.5453 - val_binary_accuracy: 0.8446
Epoch 39/50
80/80 [==============================] - 133s 2s/step - loss: 0.1057 - binary_accuracy: 0.9566 - val_loss: 0.5346 - val_binary_accuracy: 0.8446
Epoch 40/50
80/80 [==============================] - 131s 2s/step - loss: 0.1114 - binary_accuracy: 0.9511 - val_loss: 0.4829 - val_binary_accuracy: 0.8462
Epoch 41/50
80/80 [==============================] - 131s 2s/step - loss: 0.1016 - binary_accuracy: 0.9576 - val_loss: 0.5598 - val_binary_accuracy: 0.8485
Epoch 42/50
80/80 [==============================] - 131s 2s/step - loss: 0.0932 - binary_accuracy: 0.9623 - val_loss: 0.7420 - val_binary_accuracy: 0.8313
Epoch 43/50
80/80 [==============================] - 131s 2s/step - loss: 0.0936 - binary_accuracy: 0.9594 - val_loss: 0.5957 - val_binary_accuracy: 0.8382
Epoch 44/50
80/80 [==============================] - 131s 2s/step - loss: 0.1035 - binary_accuracy: 0.9603 - val_loss: 0.7029 - val_binary_accuracy: 0.8295
Epoch 45/50
80/80 [==============================] - 131s 2s/step - loss: 0.0874 - binary_accuracy: 0.9644 - val_loss: 0.7274 - val_binary_accuracy: 0.8278
Epoch 46/50
80/80 [==============================] - 131s 2s/step - loss: 0.0841 - binary_accuracy: 0.9656 - val_loss: 0.6475 - val_binary_accuracy: 0.8474
Epoch 47/50
80/80 [==============================] - 131s 2s/step - loss: 0.0781 - binary_accuracy: 0.9683 - val_loss: 0.6149 - val_binary_accuracy: 0.8501
Epoch 48/50
80/80 [==============================] - 132s 2s/step - loss: 0.0619 - binary_accuracy: 0.9742 - val_loss: 0.6093 - val_binary_accuracy: 0.8448
Epoch 49/50
80/80 [==============================] - 128s 2s/step - loss: 0.0587 - binary_accuracy: 0.9762 - val_loss: 0.6422 - val_binary_accuracy: 0.8474
Epoch 50/50
80/80 [==============================] - 128s 2s/step - loss: 0.0570 - binary_accuracy: 0.9754 - val_loss: 0.7865 - val_binary_accuracy: 0.8491

Finally we plot the results:

In [37]:
history_frame = pd.DataFrame(history.history)
history_frame.loc[:, ['loss', 'val_loss']].plot()
history_frame.loc[:, ['binary_accuracy', 'val_binary_accuracy']].plot();

The learning curves for the model diverged after some time. This would indicate that it was prone to overfitting and in need of some regularization. However, adding some regularization with the Dropout layer helped prevent this for the start of the training.

7 Data Augmentation¶

The best way to improve the performance of a machine learning model is to train it on more data. The more examples the model has to learn from, the better it will be able to recognize which differences in images matter and which do not. More data helps the model to generalise better.

One easy way of getting more data is to use the data we already have. If we can transform the images in our dataset in ways that preserve the class, we can teach our classifier to ignore those kinds of transformations. For instance, whether a car is facing left or right in a photo doesn't change the fact that it is a Car and not a Truck. So, if we augment our training data with flipped images, our classifier will learn that "left or right" is a difference it should ignore.

And that's the whole idea behind data augmentation: add in some extra fake data that looks reasonably like the real data and your classifier will improve.

Typically, many kinds of transformation are used when augmenting a dataset. These might include rotating the image, adjusting the color or contrast, warping the image, or many other things, usually applied in combination. Data augmentation is usually done online, meaning, as the images are being fed into the network for training. Recall that training is usually done on mini-batches of data. Here is a sample of the different ways a single image might be transformed.:

Figure20.jpg

Figure 20: An image of a Car that has been augmented in different ways.

Each time an image is used during training, a new random transformation is applied. This way, the model is always seeing something a little different than what it's seen before. This extra variance in the training data is what helps the model on new data.

It's important to remember though that not every transformation will be useful on a given problem. Most importantly, whatever transformations we use should not mix up the classes. If we were training a digit recognizer, for instance, rotating images would mix up '9's and '6's.

Keras lets us augment our data in two ways. The first way is to include it in the data pipeline with a function like ImageDataGenerator. The second way is to include it in the model definition by using Keras's preprocessing layers. This is the approach that we'll take. The primary advantage for us is that the image transformations will be computed on the GPU instead of the CPU, potentially speeding up training.

Let's import the data as before:

In [38]:
# load training and validation sets
ds_train_ = image_dataset_from_directory(
    'Assets/car-or-truck/train',
    labels='inferred',
    label_mode='binary',
    image_size=[128, 128],
    interpolation='nearest',
    batch_size=64,
    shuffle=True,
)
ds_valid_ = image_dataset_from_directory(
    'Assets/car-or-truck/valid',
    labels='inferred',
    label_mode='binary',
    image_size=[128, 128],
    interpolation='nearest',
    batch_size=64,
    shuffle=False,
)

# Data Pipeline
def convert_to_float(image, label):
    image = tf.image.convert_image_dtype(image, dtype=tf.float32)
    return image, label

AUTOTUNE = tf.data.experimental.AUTOTUNE
ds_train = (
    ds_train_
    .map(convert_to_float)
    .cache()
    .prefetch(buffer_size=AUTOTUNE)
)
ds_valid = (
    ds_valid_
    .map(convert_to_float)
    .cache()
    .prefetch(buffer_size=AUTOTUNE)
)

# get the pretrained base
pretrained_base = tf.keras.models.load_model(
    'models/vgg16'
)

pretrained_base.trainable = False
Found 5117 files belonging to 2 classes.
Found 5051 files belonging to 2 classes.
WARNING:tensorflow:SavedModel saved prior to TF 2.5 detected when loading Keras model. Please ensure that you are saving the model with model.save() or tf.keras.models.save_model(), *NOT* tf.saved_model.save(). To confirm, there should be a file named "keras_metadata.pb" in the SavedModel directory.
WARNING:tensorflow:No training configuration found in save file, so the model was *not* compiled. Compile it manually.

Uncomment a transformation to see what it does:

In [39]:
from tensorflow.keras.layers.experimental import preprocessing

# all of the "factor" parameters indicate a percent-change
augment = keras.Sequential([
    # preprocessing.RandomContrast(factor=0.5),
    # preprocessing.RandomFlip(mode='horizontal'), # meaning, left-to-right
    # preprocessing.RandomFlip(mode='vertical'), # meaning, top-to-bottom
    # preprocessing.RandomWidth(factor=0.15), # horizontal stretch
    # preprocessing.RandomRotation(factor=0.20),
    preprocessing.RandomTranslation(height_factor=0.1, width_factor=0.1),
])


ex = next(iter(ds_train.unbatch().map(lambda x, y: x).batch(1)))

plt.figure(figsize=(10,10))
for i in range(16):
    image = augment(ex, training=True)
    plt.subplot(4, 4, i+1)
    plt.imshow(tf.squeeze(image))
    plt.axis('off')
WARNING:tensorflow:From C:\Users\seani\AppData\Roaming\Python\Python39\site-packages\tensorflow\python\autograph\pyct\static_analysis\liveness.py:83: Analyzer.lamba_check (from tensorflow.python.autograph.pyct.static_analysis.liveness) is deprecated and will be removed after 2023-09-23.
Instructions for updating:
Lambda fuctions will be no more assumed to be used in the statement where they are used, or at least in the same block. https://github.com/tensorflow/tensorflow/issues/56089
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:5 out of the last 5 calls to <function pfor.<locals>.f at 0x000002AEB94194C0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:6 out of the last 6 calls to <function pfor.<locals>.f at 0x000002AEB45184C0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has reduce_retracing=True option that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.

Now let's define the model, it is the same as before but now with Data Augmentation:

In [40]:
model = keras.Sequential([
    layers.InputLayer(input_shape=[128, 128, 3]),
    
    # Data Augmentation
    preprocessing.RandomContrast(factor=0.10),
    preprocessing.RandomFlip(mode='horizontal'),
    preprocessing.RandomRotation(factor=0.10),
    
    # Block One
    layers.Conv2D(filters=32, kernel_size=3, activation='relu', padding='same',
                  input_shape=[128, 128, 3]),
    layers.MaxPool2D(),

    # Block Two
    layers.Conv2D(filters=64, kernel_size=3, activation='relu', padding='same'),
    layers.MaxPool2D(),

    # Block Three
    layers.Conv2D(filters=128, kernel_size=3, activation='relu', padding='same'),
    layers.Conv2D(filters=128, kernel_size=3, activation='relu', padding='same'),
    layers.MaxPool2D(),

    # Head
    layers.Flatten(),
    layers.Dense(6, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(1, activation='sigmoid'),
])
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformFullIntV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomGetKeyCounter cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.

Now compile, fit, and plot:

In [41]:
# compile model
optimizer = tf.keras.optimizers.Adam(epsilon=0.01)
model.compile(
    optimizer=optimizer,
    loss='binary_crossentropy',
    metrics=['binary_accuracy'],
)

# fit model
history = model.fit(
    ds_train,
    validation_data=ds_valid,
    epochs=50,
)

# Plot learning curves
history_frame = pd.DataFrame(history.history)
history_frame.loc[:, ['loss', 'val_loss']].plot()
history_frame.loc[:, ['binary_accuracy', 'val_binary_accuracy']].plot();
Epoch 1/50
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformFullIntV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomGetKeyCounter cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformFullIntV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomGetKeyCounter cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting RngReadAndSkip cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting Bitcast cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting StatelessRandomUniformV2 cause there is no registered converter for this op.
WARNING:tensorflow:Using a while_loop for converting ImageProjectiveTransformV3 cause there is no registered converter for this op.
80/80 [==============================] - 192s 2s/step - loss: 0.6834 - binary_accuracy: 0.5705 - val_loss: 0.6747 - val_binary_accuracy: 0.5785
Epoch 2/50
80/80 [==============================] - 177s 2s/step - loss: 0.6751 - binary_accuracy: 0.5787 - val_loss: 0.6651 - val_binary_accuracy: 0.5785
Epoch 3/50
80/80 [==============================] - 178s 2s/step - loss: 0.6675 - binary_accuracy: 0.5787 - val_loss: 0.6607 - val_binary_accuracy: 0.5785
Epoch 4/50
80/80 [==============================] - 178s 2s/step - loss: 0.6622 - binary_accuracy: 0.5855 - val_loss: 0.6487 - val_binary_accuracy: 0.6260
Epoch 5/50
80/80 [==============================] - 179s 2s/step - loss: 0.6552 - binary_accuracy: 0.6062 - val_loss: 0.6454 - val_binary_accuracy: 0.6296
Epoch 6/50
80/80 [==============================] - 178s 2s/step - loss: 0.6448 - binary_accuracy: 0.6123 - val_loss: 0.6330 - val_binary_accuracy: 0.6436
Epoch 7/50
80/80 [==============================] - 178s 2s/step - loss: 0.6368 - binary_accuracy: 0.6213 - val_loss: 0.6237 - val_binary_accuracy: 0.6520
Epoch 8/50
80/80 [==============================] - 179s 2s/step - loss: 0.6297 - binary_accuracy: 0.6267 - val_loss: 0.6286 - val_binary_accuracy: 0.6480
Epoch 9/50
80/80 [==============================] - 180s 2s/step - loss: 0.6285 - binary_accuracy: 0.6342 - val_loss: 0.6116 - val_binary_accuracy: 0.6603
Epoch 10/50
80/80 [==============================] - 181s 2s/step - loss: 0.6229 - binary_accuracy: 0.6426 - val_loss: 0.6096 - val_binary_accuracy: 0.6688
Epoch 11/50
80/80 [==============================] - 180s 2s/step - loss: 0.6195 - binary_accuracy: 0.6447 - val_loss: 0.6029 - val_binary_accuracy: 0.6765
Epoch 12/50
80/80 [==============================] - 179s 2s/step - loss: 0.6128 - binary_accuracy: 0.6473 - val_loss: 0.6001 - val_binary_accuracy: 0.6710
Epoch 13/50
80/80 [==============================] - 177s 2s/step - loss: 0.6107 - binary_accuracy: 0.6525 - val_loss: 0.5951 - val_binary_accuracy: 0.6850
Epoch 14/50
80/80 [==============================] - 177s 2s/step - loss: 0.6047 - binary_accuracy: 0.6639 - val_loss: 0.5886 - val_binary_accuracy: 0.6805
Epoch 15/50
80/80 [==============================] - 178s 2s/step - loss: 0.6000 - binary_accuracy: 0.6605 - val_loss: 0.5786 - val_binary_accuracy: 0.6977
Epoch 16/50
80/80 [==============================] - 180s 2s/step - loss: 0.5920 - binary_accuracy: 0.6732 - val_loss: 0.5738 - val_binary_accuracy: 0.7062
Epoch 17/50
80/80 [==============================] - 177s 2s/step - loss: 0.5847 - binary_accuracy: 0.6791 - val_loss: 0.5685 - val_binary_accuracy: 0.7117
Epoch 18/50
80/80 [==============================] - 178s 2s/step - loss: 0.5787 - binary_accuracy: 0.6830 - val_loss: 0.5497 - val_binary_accuracy: 0.7292
Epoch 19/50
80/80 [==============================] - 178s 2s/step - loss: 0.5694 - binary_accuracy: 0.6895 - val_loss: 0.5532 - val_binary_accuracy: 0.7157
Epoch 20/50
80/80 [==============================] - 178s 2s/step - loss: 0.5584 - binary_accuracy: 0.7073 - val_loss: 0.5241 - val_binary_accuracy: 0.7446
Epoch 21/50
80/80 [==============================] - 181s 2s/step - loss: 0.5524 - binary_accuracy: 0.6994 - val_loss: 0.5135 - val_binary_accuracy: 0.7535
Epoch 22/50
80/80 [==============================] - 182s 2s/step - loss: 0.5380 - binary_accuracy: 0.7127 - val_loss: 0.4931 - val_binary_accuracy: 0.7696
Epoch 23/50
80/80 [==============================] - 181s 2s/step - loss: 0.5228 - binary_accuracy: 0.7205 - val_loss: 0.4959 - val_binary_accuracy: 0.7674
Epoch 24/50
80/80 [==============================] - 182s 2s/step - loss: 0.5189 - binary_accuracy: 0.7262 - val_loss: 0.4969 - val_binary_accuracy: 0.7577
Epoch 25/50
80/80 [==============================] - 181s 2s/step - loss: 0.5133 - binary_accuracy: 0.7266 - val_loss: 0.4838 - val_binary_accuracy: 0.7573
Epoch 26/50
80/80 [==============================] - 177s 2s/step - loss: 0.5088 - binary_accuracy: 0.7305 - val_loss: 0.5092 - val_binary_accuracy: 0.7535
Epoch 27/50
80/80 [==============================] - 179s 2s/step - loss: 0.4933 - binary_accuracy: 0.7338 - val_loss: 0.4638 - val_binary_accuracy: 0.7814
Epoch 28/50
80/80 [==============================] - 177s 2s/step - loss: 0.4868 - binary_accuracy: 0.7415 - val_loss: 0.4869 - val_binary_accuracy: 0.7820
Epoch 29/50
80/80 [==============================] - 177s 2s/step - loss: 0.4699 - binary_accuracy: 0.7500 - val_loss: 0.4731 - val_binary_accuracy: 0.7743
Epoch 30/50
80/80 [==============================] - 177s 2s/step - loss: 0.4586 - binary_accuracy: 0.7649 - val_loss: 0.4431 - val_binary_accuracy: 0.7824
Epoch 31/50
80/80 [==============================] - 182s 2s/step - loss: 0.4510 - binary_accuracy: 0.7872 - val_loss: 0.4247 - val_binary_accuracy: 0.8004
Epoch 32/50
80/80 [==============================] - 182s 2s/step - loss: 0.4424 - binary_accuracy: 0.7878 - val_loss: 0.4386 - val_binary_accuracy: 0.7832
Epoch 33/50
80/80 [==============================] - 182s 2s/step - loss: 0.4394 - binary_accuracy: 0.7932 - val_loss: 0.4483 - val_binary_accuracy: 0.7897
Epoch 34/50
80/80 [==============================] - 181s 2s/step - loss: 0.4358 - binary_accuracy: 0.7928 - val_loss: 0.4377 - val_binary_accuracy: 0.7987
Epoch 35/50
80/80 [==============================] - 181s 2s/step - loss: 0.4243 - binary_accuracy: 0.8038 - val_loss: 0.4291 - val_binary_accuracy: 0.8012
Epoch 36/50
80/80 [==============================] - 181s 2s/step - loss: 0.4071 - binary_accuracy: 0.8116 - val_loss: 0.4392 - val_binary_accuracy: 0.7983
Epoch 37/50
80/80 [==============================] - 190s 2s/step - loss: 0.4067 - binary_accuracy: 0.8114 - val_loss: 0.4010 - val_binary_accuracy: 0.8141
Epoch 38/50
80/80 [==============================] - 178s 2s/step - loss: 0.3970 - binary_accuracy: 0.8122 - val_loss: 0.4324 - val_binary_accuracy: 0.7874
Epoch 39/50
80/80 [==============================] - 178s 2s/step - loss: 0.3927 - binary_accuracy: 0.8206 - val_loss: 0.4286 - val_binary_accuracy: 0.7886
Epoch 40/50
80/80 [==============================] - 178s 2s/step - loss: 0.3715 - binary_accuracy: 0.8319 - val_loss: 0.3697 - val_binary_accuracy: 0.8313
Epoch 41/50
80/80 [==============================] - 178s 2s/step - loss: 0.3695 - binary_accuracy: 0.8323 - val_loss: 0.3497 - val_binary_accuracy: 0.8468
Epoch 42/50
80/80 [==============================] - 177s 2s/step - loss: 0.3555 - binary_accuracy: 0.8339 - val_loss: 0.3561 - val_binary_accuracy: 0.8398
Epoch 43/50
80/80 [==============================] - 178s 2s/step - loss: 0.3560 - binary_accuracy: 0.8403 - val_loss: 0.3415 - val_binary_accuracy: 0.8480
Epoch 44/50
80/80 [==============================] - 179s 2s/step - loss: 0.3382 - binary_accuracy: 0.8419 - val_loss: 0.3462 - val_binary_accuracy: 0.8420
Epoch 45/50
80/80 [==============================] - 178s 2s/step - loss: 0.3422 - binary_accuracy: 0.8487 - val_loss: 0.3704 - val_binary_accuracy: 0.8337
Epoch 46/50
80/80 [==============================] - 176s 2s/step - loss: 0.3377 - binary_accuracy: 0.8407 - val_loss: 0.4439 - val_binary_accuracy: 0.8165
Epoch 47/50
80/80 [==============================] - 177s 2s/step - loss: 0.3329 - binary_accuracy: 0.8468 - val_loss: 0.3515 - val_binary_accuracy: 0.8549
Epoch 48/50
80/80 [==============================] - 177s 2s/step - loss: 0.3170 - binary_accuracy: 0.8538 - val_loss: 0.3272 - val_binary_accuracy: 0.8630
Epoch 49/50
80/80 [==============================] - 178s 2s/step - loss: 0.2945 - binary_accuracy: 0.8667 - val_loss: 0.3602 - val_binary_accuracy: 0.8525
Epoch 50/50
80/80 [==============================] - 178s 2s/step - loss: 0.2969 - binary_accuracy: 0.8587 - val_loss: 0.3371 - val_binary_accuracy: 0.8594

As we can see, there is a huge boost in accuracy, and the model is no longer overfitting.

In [ ]: